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Abstract: The Poisson distribution is often used as a standard model for 

, count data. Quite often, however, such data sets are not well fit by a Poisson 

, model because they have more zeros than are compatible with this model. 

' For these situations, a zero-inflated Poisson (ZIP) distribution is often pro- 

I ' posed. This article addresses testing a Poisson versus a ZIP model, using 

' Bayesian methodology based on suitable objective priors. Specific choices of 

^ , objective priors are justified and their properties investigated. The methodol- 

, ogy is extended to include covariates in regression models. Several applications 

i are given. 
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1. Introduction 

The Poisson distribution is often used as a standard probability model for count 
data. For example, a production engineer may count the number of defects in items 
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randomly selected from a production process. Quite often, however, such data sets 
are not well fit by a Poisson model because they contain more zero counts than are 
compatible with the Poisson model. An example is again provided by the produc- 
tion process; indeed, according to Ghosh et al. [I -I], when some production processes 
are in a near perfect state, zero defects will occur with a high probability. How- 
ever, random changes in the manufacturing environment can lead the process to 
an imperfect state, producing items with defects. The production process can move 
randomly back and forth between the perfect and the imperfect states. For this 
type of production process many items will be produced with zero defects, and this 
excess might be better modeled by a ZIP distribution than a Poisson distribution. 
For < p < 1,A>0, the ZIP(A,p) distribution has the probability function 

(1.1) Mx\X,p)=pI{x = 0) + {l-p)foix\X), a; = 0,l,2,... , 

where /(•) is the indicator ftmction, and /o(.'2;|A) is the Poisson probability function 

(1.2) /o(a;|A) = x = 0,l,2,.... 

x\ 

The parameter p is referred to as the zero-inflation parameter. 

Many authors used the ZIP distribution with and without covariates to model 
count data. In a ZIP regression model, Lambert [18] used a frequentist approach 
and Ghosh et al. [14] used a Bayesian approach to analyze industrial data sets. 

While the aforementioned authors used the ZIP model to analyze their data, a 
number of authors have addressed the problem of checking whether a ZIP model is 
needed to model the data. From the frequentist perspective, score tests have been 
developed for testing the hypothesis Hq : p = vs. Hi : p 7^ in a ZIP regression 
model ([10], [12]). From the Bayesian perspective, Bhattacharya et al. [9] presented 
a Bayesian method to test p < versus the alternative p > by computing a certain 
posterior probability of the alternative hypothesis. As in ([10], [12]), p is allowed to 
be negative in their model [9], as long as p + (1 — p)e^^ > 0. 

In this paper, we consider Bayesian testing of Mq versus Mi given by 

(1.3) Mo: X,:-'^^- fo{-\X), t=l,...,n, 

(1.4) Mi: X, "^''■/i(-|A,p), z=l,...,n, 

where /o, /i are given in (1.1) and (1.2), respectively. Note that, as opposed to the 
situations in the papers mentioned above, p < is not possible here. Indeed, we 
can alternatively formulate the problem as that of testing, within the ZIP model, 

Ho : P ^ versus Hi : p > 0. 

Unlike the analysis in [9], p = (i.e., the Poisson model) is assumed to have a priori 
believability (e.g., prior probability 1/2). 

In Section 2 we develop the suggested objective testing of Poisson versus ZIP 
models when not all counts are zeros. For all zeros, the ZIP distribution is not 
identifiable, and a proper prior is required for all parameters; we address this in 
Section 5. Section 3 is devoted to some comparative examples. We consider inclusion 
of covariates in Section 4, where we address the testing of Poisson versus ZIP 
regression models and give an example involving AIDS related deaths in men. In the 
regression case, in order for the objective Bayesian model selection to be successful 
we need enough positive counts so that the design matrix based on the positive 
counts is full column rank. When this condition does not hold we suggest in Section 5 
a partially proper prior on the regression parameters to be used for model selection. 
Proofs and technical details are relegated to an Appendix. 
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2. Formulation of the problem 

The Bayesian methodology for choosing between two models for some data is con- 
ceptually very simple (see, e.g., [3]). One assesses the prior probabilities of each 
model, the prior distributions for the model parameters, and computes the pos- 
terior probabilities of each model. These posterior probabilities can be computed 
directly from the prior probabilities and the Bayes Factor, an (integrated) likeli- 
hood ratio for the models which is very popular in Bayesian testing and model 
selection. 

Often it is not possible (for lack of time or resources) to carefully assess in a sub- 
jective manner all the needed priors. In these situations, very satisfactory answers 
are provided by objective Bayesian analyses that do not use external information 
other than that required to formulate the problem (see [4]). First we review below 
some difficulties of model selection via objective Bayesian analysis. Then we justify 
the objective prior we chose for our problem, derive the corresponding Bayes Factor 
and study properties of the prior and the Bayes factor. 



2.1. Bayesian model selection and Bayes factors 

To compare two models, A/g and Afi, for the data X = {Xi, . . . , X„), the Bayesian 
approach is based on the Bayes factor Biq of Mi to Mq given by 

,2.^ „ _ miix) _ J fijx \ eMe,)dei 

^ ■ ^ moix) J fo{x \ eo)noi9o)d9o ' 

where, under model Mi, X has density fi{x \ 6i) and the unknown parameters 6i in 
Mi are assigned a prior density 71^(0,), i = 0, 1. For given prior model probabilities 
Pr{Mo) and Pr{Mi) = 1 — Pr(Mo), the posterior probability of, say, Mq is 



(2.2) Pr{Mo I x) = 



Pr{Mi) 



In objective Bayesian analyses 7Ti{9i) is chosen in an objective or conventional 
fashion and the hypotheses would be assumed to be equally likely a priori. 

Use of objective priors has a long history in Bayesian inference (see, for ex- 
ample, [8] and [17] for justifications and references). They are, however, typically 
improper and are only defined up to an arbitrary multiplicative constant. This is 
not a problem in the posterior distribution, since the same constant appears in both 
the numerator and the denominator of Bayes theorem and so cancels. In model se- 
lection and hypothesis testing, however, it can be seen from (2.1) that when at 
least one of the priors '!Ti{6i) is improper, the arbitrary constant does not cancel, 
so that the Bayes factor is then arbitrary and undefined. An important exception 
to this arises in invariant situations for parameters occurring in all of the models; 
Berger et al. [7] show that use of the (improper) right Haar invariant prior is then 
permissible. 

One of the ways to address this difficulty is to try to directly "fix" the Bayes 
factor by appropriately choosing the multiplicative constant, as in [l.')]. Popular 
methods (the intrinsic Bayes factor [5] and the fractional Bayes factor [20]) for 
fixing this constant arise as a consequence of "training" the improper priors into 
proper priors based on part of the data or of the likelihood. We refer to Berger 
and Pericchi [(>] for a review, references and comparisons. Another possibility is 
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to directly derive appropriate "objective" but proper distributions 7Ti{9i) to use in 
model selection; see [2] and [15] for methods and references. This is the approach 
taken in this paper (with a slight exception in Section 5). 

2.2. Specification and justification of the objective priors 

Returning to the testing of the Poisson (Mo) vs. the ZIP [Mi) models, i.e., testing 

(2.3) Mo: foix \ A) vs. Mi : h{x \ X,p), 

the key issue is the choice of the priors 7ro(A) and 7ri(A,p) ~ 7''i(A) 7ri{p \ A). 

A frequent simplifying procedure (both for subjective and objective methods) 
is to take 7ro(A) equal to 7ri(A), that is, to give the same prior to the parameters 
occurring in all models under consideration. This, however, may be inappropriate, 
since A might have entirely different meanings under model Mq and under model 
Mi; the fact that we have used the same label does not imply that they have the 
same meanings. This frequent mistake is discussed, for example, in [7]. 

It has been argued that, if the common parameters arc orthogonal to the re- 
maining parameters in each model (that is, the Fisher information matrix is block 
diagonal), then they can be assigned the same prior distribution ([15], [IG]). In this 
case, improper priors can be used, since the arbitrary constant would cancel in the 
Bayes factor. 

Unfortunately, p and A in the ZIP model are not orthogonal. We first reparam- 
eterize the original model. With p* = p + (1 — p)e~^, we rewrite fi{x \ X,p) as 

(2.4) /*(x|A,p*)=p*/(a; = 0) + (l-p*)/^(x| A), x = 0,l,2,..., 

where f'^{x \ A) is the zero-truncated Poisson distribution with parameter A. Note 
that p* > e~^. We can trivially express the Poisson (A/q) model as: 

(2.5) /*(a;| A) = e-^/(x = 0) + (l-e-^)/^(a;| A), x = 0,l,2,..., 

and now it can intuitively be seen that A has the same meaning in both /* and /g . 
Indeed the Fisher Information matrix for p* and A can be checked to be diagonal. 

With an orthogonal rcparametcrization, Jeffreys (1961) recommended using (i) 
Jeffreys prior (the square root of Fisher information) for the "common" parameters; 
and (ii) a reasonable proper prior for the extra parameters in the more complex 
model. 

The situation here is very unusual, however, in that the Jeffreys prior for the 
"common" A is different for each model. The Jeffreys prior for A in the Poisson 
model is well known to be ttj = I/a/A, whereas the Jeffreys prior for the orthog- 
onalized ZIP model is easily shown to be the same as the Jeffreys prior for the 
truncated distribution f'^{x \ A), which is 

KjW = —rf- ' '^l^cre fc(A) = . 

V A 1 - e ^ 

That these priors are different after orthogonalization is highly unusual and can 
be traced to the fact that A also enters into the definition of the nested model, 
through p* = e~^.ln any case, we are left without clear guidance as to whether ttj 
or TTj should be used as the prior for A. (Note that, in computing the Bayes factor, 
the same prior for A must be used in both the numerator and the denominator; 
otherwise one is facing the indeterminacy issues discussed earlier.) 
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Under the orthogonalized ZIP model, we also need to specify a proper prior for 
p* given A, which we propose to take uniform over the interval (e""^, 1), that is, 

l^) = f^— A • 

We can thus write the overall priors being considered for the two models fQ{x \ A) 
and fl{x I A,p*) as, respectively, 

7ro(A)^^, .,(A,p) = -^ ^-^^ , 

where Z is or 1 as we utilize one or the other of the two Jeffreys priors for A. 

It is computationally more convenient to work in the original (p. A) parameteri- 
zation. A change of variables above then results in the priors 

(2.6) = -UA,rt = ^/(0<p<l), 

which we will henceforth consider (for I equal to or 1). 

We arc not aware of any desiderata that would suggest a preference for either 
the I = prior or the I = 1 prior, but luckily the two yield almost the same answers. 
Indeed, simple algebra shows that k{X) is a strictly increasing function of A and 
that 

(2.7) inf fc(A) = -^= 0.71 and sup fc(A) = 1. 



Thus fc(A) is quite flat as a function of A, so that fc(A)^ and fc(A)° = 1 are very 

/ 

10' 



similar. An immediate consequence for the Bayes factors B{q, Z = 0, 1 is that 



so that the two Bayes factors can only differ by a modest amount (and in practice 
the difference is much smaller than this). 

It is obviously a bit simpler to work with the I = prior, so we drop the I 
superscript and henceforth utilize the prior 

(2.8) 7ro(A) = -^, 7ri(p,A) = -^ /(0<p<l). 

2.3. Objective Bayes factor for Poisson versus ZIP models 

Recall that the model Mq is the standard Poisson model and the model Mi is 
the ZIP model. For a sample of n counts Xi, . . . ,X„, let X denote the sample, 
k = X]r=i ^{^i = 0) t>c the number of zero counts, and s = ^11=1 -^i be the total 
count. Note that fc = n is equivalent to s = 0. For given data x, the densities 
fo{x I A) and fi{x \ X,p) under the two models are given by 



e 



— nA 



A" [p+ (1 -39)e-^]'=(l -p)"-*-'e-("-'^)^A^ 



foix I A) = = -, fi{x I A,p) = „ I 

For s > (i.e., the counts are not all zero), 



moix) ^ foix\ A)7ro(A)dA 



r(^^+^) 

Xi'. 
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Using the binomial expansion of [p + (1 — p)e 



mi{x) = fi{x \ X,p)TTi{p,X)dp dX 




Both mo (a;) and mi (a;) are finite and the Bayes factor Bio{x) = mi{x)/ma{x) is 



Note that, as intuitively expected, for any given n the Bayes factor is increasing in 
s (total count) for any fixed k (the number of zero's), and is increasing in k for any 
fixed s. We use (2.9) to calculate the Bayes factors for the examples in Section 3. 

When s = or equivalently all counts are zero {x = 0), there is a problem. 
While mo(0) = r(l/2)/Y^ remains finite, it is easy to see that toi(0) is infinite. 
Indeed for any prior of the form h(jj)T:{X), where 7r(A) is improper and h(p) is 
a proper density (as is required for testing), the marginal density mi(0) will be 
infinite. This is because, for a; = 0, the density /i(a; | X,p) > p" implying mi(0) > 
Jq p"h{p)dp 7r(A)(iA — oo. We discuss what to do for this case in Section 5. 

3. Applications 

In this section we apply our methodology to two datasets to detect if zero- inflation is 
present in the data. These examples have been analyzed for zero-inflation previously 
using both frequentist and Bayesian procedures. Since there are non-zero counts in 
both examples, the Bayes factors are computed using (2.9). 

Example 3.1. The first dataset is the Urinary Tract Infection (UTI) data used 
in Broek [10], which used a score test to detect zero-inflation in a Poisson model. 
The data are collected from 98 HIV-infected men treated at the Department of 
Internal Medicine at the Utrecht University hospital. The number of times they 
had a urinary tract infection was recorded as X. The data are recorded in Table 1. 
Merely by looking at the data it is apparent that zero- inflation is present. 

Equation (2.9) yields a Bayes factor Biq = 223.13 in favor of model Mi versus 
model Mo; if the models were believed to be equally likely a priori, the resulting 
posterior model probabilities would be Pr{Mi \ x) = 0.995 and Pr(Mo | x) = 
0.005. This is indeed strong evidence in favor of the ZIP model. 

In Bayesian testing of TCq p < versus Hi : p > 0, Bhattacharya et al. 
[9] obtained Pr{p > | a;) = 0.999. The observed value of the score statistic was 
reported as 15.34 [10], yielding a p- value of 0.0001. All three analyses present strong 




A: 



(2.9) 



Table 1 
UTI Data 



X 







1 



2 



3 



Total 



Frequency 



81 



9 



7 



1 



98 
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Table 2 
Terror Data 



X 





1 


2 


3 


4 


Total 


Frequency 


38 


26 


8 


2 


1 


75 



evidence in favor of the ZIP model, but notice that the p-value seems to suggest 
stronger evidence against the Poisson nuU than the Bayesian analysis, and the point 
null Bayesian analysis suggests weaker evidence than the interval Bayesian test. 

Example 3.2. The next dataset we consider is the Terrorism data from [11]. Table 
2 gives the number of incidents of international terrorism per month (X) in the 
United States between 1968 and 1974. It is not intuitively clear whether or not 
there is zero-inflation in this data set. 

The Bayes factor here is i?io = 0.28, yielding an objective posterior probabil- 
ity Pr{Mi I x) = 0.219, which actually supports the Poisson model. A previous 
analysis found Pr{p > | a;) = 0.507, an indeterminate value [')]. The observed 
value of the score statistic is 0.04, with a p- value of 0.83. Conigliani et al. [11] test a 
Poisson null model against a nonparametric alternative, finding a fractional Bayes 
factor B[q of 0.0089 of the nonparametric alternative to the Poisson; the apparent 
strength of this conclusion, compared with the other results, is rather puzzling. 

4. Model selection in ZIP regression 

Many applications involve count data where covariate information is available; see, 
for example, [14] and [18]. In this section we consider selecting between Poisson 
regression and ZIP regression models given by 

(4.1) : X, *~ Poisson{X^), i = l,...,n, 

(4.2) : X, '"^ Z/P(A„p), i^l,...,n. 

For a known offset variable ao;, a q x 1 vector of covariates and regression 
parameters /3 = {f3i, . . . ,Pq)'^, suppose the Xi follow the log-linear relationship 

log(A,;) = floi + af/3. 

We assume that the matrix ~ (ai, . . . , a„) is of rank q. Let k denote the number 
of zero counts in the data. For simplicity of notation, we index the observations in 
such a way that all the zeros are given by the first k counts. 

4-1. Objective priors for model selection 

Generalizing the argument in Section 2.2 to the regression case is easy in one case, 
but difficult in the other. If we choose to base the analysis on the Jeffreys prior for 
P under the Poisson regression model Mq -, the generalization is straightforward: 
the Jeffreys prior is easily computed as 

n 

(4.3) 7T^{f3) = \Y,Ka,aJ\^/\ 

1=1 
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Note that this prior is positive since the rank of A is q. Also, utiUzing this prior for 
/3 under model Mi , along with the independent uniform prior for p, results in the 
following priors to be utilized to compute Biq: 

n n 

(4.4) ^0(/3) = |^A.a,af|i/2, ^0(/3,p) = | ^ A,a,af |i/2/(o < p < 1) . 

1=1 i=l 

The generalization to the regression case of the second prior considered in Section 
2.2 is much more difficult, because the Jeffreys prior under the ZIP regression model 
is very complicated. In Section 2.2, the derivation of the corresponding Jeffreys prior 
was essentially done by ignoring the zero counts, utilizing only the truncated Poisson 
distribution. This suggests modifying (4.3) by removing the terms corresponding 
to the zero counts, resulting in 

n 

(4.5) Trf (/3) = I A.a,af 

i=k+l 

From another intuitive perspective, the zero counts arising from the inflation factor 
are clearly irrelevant in fitting the log linear model to the and, since we do not 
know which zero counts arise from the inflation factor, dropping them all from the 
Jeffreys prior has an appeal. Let = {ak+i, ■ ■ ■ , dn)^ ■ The prior (4.5) can only 
be used provided it is positive, which is ensured if the rank of is q. 
The resulting overall prior for use in computing Biq is then 

n n 

(4.6) 7ri(/3) = I ^ Ka,af\"\ 7rl{f3,p) = | X,a,af\^/' 1(0 < p < 1) . 

i=k+l i=fe+l 

The first basic issue in use of these priors is whether or not they yield finite 
marginal distributions. This is addressed in the following theorems, the first of 
which deals with the marginal density under the Poisson regression model. 

Theorem 4.1. For the Poisson regression model and either the Jeffreys prior (j = 
0) or the modified Jeffreys prior [j = 1), 



(4.7) m^{x) - / \[{ ^}7rf (/3)ci/3 < oo. 



-A; 

{- 

'R- 7=1 

Proof. See the Appendix. □ 



Note that with more than one covariate there is typically no closed-form expres- 
sion for m^ix). Hence m§{x) needs to be evaluated by numerical or Monte Carlo 
integration. 

For the ZIP regression model, the marginal density mf^{x), under an arbitrary 
improper prior 7r(/3) for (3 and an independent uniform prior for p, is given by 



»i 



(4.8) mf(rr)= / / fi{x \ f3,p) 7r{f3) dp df3, 

jRi Jo 

where the density of x, under model A//^, is given by 

k n 

f^{x\f3,p) = l[{p+{l-p)e-'^}{l-p)-'^ n ■ 

1=1 i=k+l 
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Again, as for m^{x), there is usually no closed-form expression for mf{x) and the 
marginal needs to be computed via numerical or Monte Carlo integration. 
To investigate the finiteness of ?7if'(a;), note first that 

(4.9) /(1-p)"-'- n — ^</i(^iap)< n 

i=k+l *■ i=k+l ''■ 

In view of this inequality and the independent uniform prior for p, the marginal 
mi'{x) is finite if and only if 



-A, yx. 



(4.10) / TT f^^(/3)rf/3<oo. 



Theorem 4.2 below gives sufficient conditions for this to be finite under the priors 
(4.3) and (4.5) respectively. Recall that the k zeros in the sample are labeled to 
correspond to the first k observations. A key condition will be that the matrix 
has rank q which implies that n > k + q (analogous to the condition of at least one 
positive count for the case of no covariate treated in Section 2). 

Theorem 4.2. Using 7r^(/3): Suppose that, for the observation Xj,j = l,...,k, 
corresponding to the zero counts, the corresponding covariate vector aj is such that 

n 

(4.11) Qj = Cmj dm with Cmj > 0, j = 1, . . . , fc, m = fc + 1, . . . , n. 

m— /c+1 

Then the marginal m^ix) is finite. 

Using Tii '{(3): If A-^ has rank q, the marginal m{^(a;) is finite. 

Proof. See the Appendix. □ 

Clearly the condition under which jTif (a;) is finite is more general and much 
easier to check for 7rf (/3) than for ii^{f3). This, together with the intuitive appeal 
of 7rf-(/3), leads us to recommend its use in practice. (Note that either of the two 
priors reduces to the prior recommended in Section 2 for the non-regression case.) 

Remark 4.1. If the condition (4.11) fails, the marginal density mf (a;) based on 
the Jeffreys prior may be infinite. For example, consider n = 3 and q = 2, with 
Ai = A2^A3^, A2 = exp(/3i), A3 = exp(/32) for suitable nonzero ci,C2 to be chosen 
later. Then the determinant of information matrix for (3 is given by 

\m\ = A2A3 + c?A^^A?+i + c^A^^+^A^^ , 

so that |J(/3)|i/2 > |ci|A2'^^A^^'+^^/^ If ATj = 0, ^2 = X2 and X3 = 0:3, then 

I poo poo 

^1 e-^^\f~^+-^^'d\2 e-^^A^^-i+'^^^+'^dAa = 00, 



a:2!a;3!2 







providing that X2 < — .5ci or that < —.5 — .5c2. For example, if ci = —5 and a 
sample produces X2 ~ 2, then mf (a;) — 00. Note that here ai = — 5a2 + €203, with 
02 = (1,0)"^ and 03 = (0, 1)"'", so that the condition (4.11) does not hold. 
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4-2. An illustrative application 

We apply the methodology recommended m Section 4.1 to a dataset hwolving the 
number of AIDS-related deaths in men. The data provides the number of deaths for 
598 census tracts in a large city of Spain over a period of eight years. The dataset, 
which was supplied to us by Dr. M.A.M. Beneyto, has a large number of tracts 
with zero deaths (actually, 303, which is k in our notation). Along with the number 
of deaths, the dataset also provides, for each census tract, the expected number 
of deaths E from AIDS (adjusting for the population and the distribution of ages 
in each tract) and an auxiliary variable W (continuous in nature) measuring the 
social status of each census tract. 

In our application and for the ith census tract, we take log{Ei) as the offset ooi 
and propose a log-linear regression for with q = 2 and = (1,VF,;)-^. First, 
we will ignore the covariate W and compute the Bayes factor taking q = 1 and 
ai = 1 based on the Jeffreys prior. This model modifies the common mean model 
of Section 2.2 by incorporating the offset variable in the mean, which is here given by 
EiX with X = Pi. The marginal mi{x) is computed by one-dimensional numerical 
integration. Although it has a closed-form expression, it is rather complicated and 
omitted here to save space. This expression is given in the Appendix in [1]. For the 
specific data here. Bio = 22, 975 which gives overwhelming evidence in favor of the 
ZIP model. 

Epidemiologists who are knowledgeable about this study believed that the large 
number of zero counts in the data could be explained by the covariate measuring 
the social status and, indeed, suspected that a ZIP regression model would not be 
needed if the covariate were incorporated into the analysis. The Bayes factor in 
favor of the ZIP regression model versus the Poisson regression model (with g = 2) 
is given by 7.25. While this Bayes factor provides a moderate amount of evidence in 
favor of the ZIP regression model, it is much smaller than 22,975, indicating that, 
indeed, the covariate can explain most of the excess zero counts. 

In this example, it is possible that the same inflation parameter p may not be 
appropriate for all individuals. Just like using the log- linear models for A^, we can 
treat each pi differently (as p may change according to the covariates) and fit a 
logistic regression model for pi. But it is highly likely that there would be severe 
confounding between the two regressions, which is particularly problematical with 
objective Bayesian analysis (since there is not a proper subjective prior to overcome 
the confounding). 

5. Analysis with insufficient positive counts 

As noted in Section 2. the marginal density under model Mi based on an improper 
prior for A is not finite when all counts are zeros, and hence the Bayes factor is not 
well-defined. This is not a difficulty of only model selection; in this situation, it is 
also not possible to make inferences about the parameters of the ZIP model, since 
the joint posterior of the parameters (under the ZIP model) is improper. Indeed, 
when all counts are zero, the ZIP model parameters are not identifiable, and the 
data do not provide enough information to estimate the parameters. Since objective 
Bayes methods are typically based on information from the data alone, it is not 
surprising that problems are encountered. 

We could simply invoke this argument and refrain from considering the case 
when all counts arc zero. However, it is interesting to explore several methodologies 
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that have been proposed for difficult testing situations, partly to judge the success 
of the methodologies and partly to try to provide a reasonable answer to this case. 
We continue, throughout the section, to assume that p ^ Un{0, 1). 

5.1. All zero counts in the non-regression case 

We mentioned that to resolve the identifiability issue in the ZIP model for the data 
with all zeros we need a proper prior on A. This can be done by either subjectively 
specifying a proper prior for A or by "training" the improper priors into proper 
priors based on part of the data or of the likelihood. In particular, the intrinsic 
Bayes factor approach [5] utilizes a part of the data as a training sample to train the 
improper prior to get a proper posterior. Although this approach works successfully 
in many examples, it is not successful in the present problem. Our investigation of 
this approach [1] is omitted here to save space. We discuss below the case where a 
subjective proper prior on A is specified based on certain considerations. 

If a proper prior is needed to define the Bayes factor for the situation of all zero 
counts, the most direct approach is to find a proper prior that seems compatible with 
certain behaviors that we expect of the Bayes factor in this situation. A natural 
proper prior to consider for A is a Gamma {Ga{a, b)) conjugate prior under the 
Poisson model (Mg) given by the Gamma g{X \ a, b) density 



where a, b are suitably chosen positive constants. Of course, one is welcome to 
simply make subjective choices here, but we will argue for a certain choice (or 
choices) based on rather neutral thinking. 

First, we assume that the same gamma prior is appropriate for A, both under 
the Poisson and the ZIP models. This can be justified by the orthogonalization 
argument used in Section 2.2. With the uniform density for p and the Ga{a,b) 
prior for A, the resulting Bayes factor for arbitrary data x can be computed to be 



by a similar argument to that leading to (2.9). This Bayes factor includes as a 
special case the objective Bayes factor in (2.9); indeed the Jeffreys prior used there 
was a limiting case of the ^(A | a, b) for a = 1/2 and 6 = 0. Note that the Bayes 
factor (5.1) is increasing in s, k and a, and decreasing in b. 

For the special case x = (that is s = and k = n), note that /i(0|A,p) > 
/o(0|A). Hence, using the same proper prior for A with both the Poisson and the 
ZIP models, it follows that mi(0) > mo(0), and hence, i?io(0) > 1. In particular, 
for the Un{0, 1) prior for p and Ga{a, b) prior for A, it can be checked that 



This is reasonable: when a long stream of only zeros is observed, it is entirely natural 
to say that the data favor the ZIP model. But the degree of favoritism depends on 
a and 6, and we turn to rather speculative desiderata to narrow the choice. Recall 
that the mean of the Ga{a, b) distribution for A is ab~^ and the variance is ab~^. 



(5.1) 




(5.2) 
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In order for the prior not to be too sharp, it is reasonable to require the prior 
standard deviation to be no less than the prior mean. This implies that a < 1. It also 
seems reasonable to require the prior mean to be at least 1, so that small values of A 
do not have excessive prior probability. This leads to 6 < a. Since the Bayes factor 
is decreasing in 6, the smallest Bayes factor satisfying the above constraints (that 
is, the one lending the most support for the Poisson model Mq) is then obtained 
by taking b ~ a (this gives a prior mean of 1). It is not unreasonable to select this 
prior as it belongs to a reasonable class which is most favorable to the null model. 
Finally, one might judge it to be unappealing to utilize a prior for A which is not 
bounded near zero (for a < 1 the gamma density is decreasing with an asymptote 
at A = 0) which implies that a should be at least 1. Thus we end up with the choice 
a = 6 = 1. Note that a = 1 is the upper limit of a < 1 and the choice a = I now 
counterbalances the Bayes factor in favor of Mi (whereas 6 = a in the range b < a 
tilts the Bayes factor in favor of Mq). This reasoning is all rather speculative and, 
of course, the result is a particular prior, which may not reflect actual prior beliefs. 
Nevertheless it is instructive to study the behavior of the Bayes factor when this 
prior is used. 

For a = b = 1, that is, the Exponential(l) distribution, it can be checked that 
Bio = Sj=oO + l)~^,which is thus our recommended default Bayes factor when 
observing only zero counts. Note that Biq{0) w log(?i + 1) for large n. So a large 
string of all zero counts in a sample will lead to a Bayes factor approaching infinity 
at the slow rate of log(rt). The large sample behavior of the Bayes factor for this 
type of sample seems intuitively reasonable. 

5.2. Insufficient positive counts in the regression case 

In the regression situation of Section 4, it was necessary to have sufficient positive 
counts so that the conditions of Theorem 4.2 were satisfied. We will restrict discus- 
sion here to the situation involving the prior specifications in (4.6), for which the key 
condition needed for the marginal to be finite was that the matrix A+((n ~ k) x q) 
should be of rank q. If the number of positive counts n — k is insufficient so that t, 
the rank of is less than q, this solution will not work. 

Remark 5.1. Indeed, neither the prior for /3 given by (4.3) nor by (4.5) guarantees 
a finite positive marginal density. We omit the proof to save space. A proof may be 
found in the Appendix in [I]. 

We call this situation one of rank deficiency, with the rank deficiency of equal 
to q — t. The situation is analogous to the case of all zero counts without covariates 
discussed in Subsection 5.1. (In the setup of that section, q — 1 and rank A+ less 
than 1 means that k = n, i.e., no positive counts.) We could again merely recognize 
that this type of data is just not informative enough to allow for objective Bayes 
analysis. We shall however propose a prior that yields finite marginal densities, 
following similar reasoning to that used in Section 5.1. 

We continue to use a Un(0, 1) prior for p and focus on proposing suitable priors 
for (3. A discussion similar to that in subsection 5.1 shows that this prior has to be 
at least, partially proper. 

Note that, instead of specifying a prior on (3, we can specify a prior on g inde- 
pendent parametric functions of /3; our specific proposal is to carefully choose these 
functions such that t of them are well identified by the data with positive counts 
while the remaining q~t are not. We then propose to use a version of Jeffreys prior 
on the former t fimctions, and a proper prior on the latter q — t fimctions. 
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Specifically, let Aq denote the kx q matrix whose k rows are , . . . , . A rank 
oi A ~ q and a rank of = t imply a rank of Ao > q — t. Let V+ C i?« denote 
the vector space of dimension t formed by the columns of A^. Suppose a,;j , . . . , a.i^ 
are all of the vectors from ai, . . . , corresponding to the zero counts which are 
in V+. Note that 0<r<k — {q — t). These vectors are linear combinations of the 
vectors aj-^ , • ■ • , dj^ and the corresponding A^^ , . . . , A^^ are functions of A_,j , . . . , Aj^ . 
From the set of {Xj : j G {1, . . . ,k} — {ii, . . . , v}} we select q — t A's, A;^ , . . . , A;^_j 
such that {dj-^ , • • ■ , dj, , J • • • , oii,_t } is linearly independent. 

Note that there is an (n — fc) x t matrix C of rank t such that 

(afc+i, . . . ,a„) (ttj,, . . . ,ajJC'^. 

Let £) = D{\j-^ , . . . , Ajj). Then, the information matrix for Aj^ , . . . , Aj^ based on 
the Poisson model for the observations fc + 1, . . . , n is given by 

(5.3) /(A,,, . . . , A,J = D-^C^Diag{Xk+u . . . , Xn)CD-\ 
We define a partial Jeffreys prior for Aj^ , . . . , A_,( by 

t 

(5.4) ^pj(A,, , . . . , A,J = {[] A-i}|C^I?ia5(Afc+i, . . . , A„)C|i/2. 

Let {bi, . . . , denote an orthonormal basis of the space spanned by a;^, . . . , 

a;^_j . Define = e^^l^ , w = 1, . . . , g — t. Note that A;^^, ,w = l,...,g— t can be 
expressed in terms of ^i, . . . , S,q-t- Indeed, 

q-t 

log(Az„ ) = ao;„ + ^ d^/i log(6i): w = 1, . . . , g - 

h=l 

where dwh — bj^di^^ . Finally, we assign independent exponential distributions with 
mean 1 to each of This prior will induce a proper distribution on 

Xi^ ,w = 1, . . . , q — t with a density which we denote by TTpropiXi^ 7 • ■ • j ^iq-t)- The 
final prior used to calculate the marginal density under model Mj'^ is then given by 

7r(Aji , . . . , Ajj, Azi , . . . , A;^_j ) = TTpj{Xj^ , • ■ • , )TTprop{Xi^ , • ■ • , hq-t ) \ 

this is partially Jeffreys prior and partially proper. The corresponding prior density 
on (3 is, of course, obtained through transformation. Further, along the line of the 
proof of Theorem 4.2, it can be checked that the marginal density m^{x) will be 
finite. We omit the details to save space. 

While there is arbitrariness in the specific choice of Aj^ , . . . , Aj^^j to assign a 
subjective prior distribution based on exponential distributions, the partial Jeffreys 
prior in (5.4) remains invariant to the choice oit independent A's from Afe+i, . . . , A„. 
This solution thus seems reasonable for small q — t. 

To avoid the arbitrariness, we could consider all possible selections of (q — t) of 
the A's from Ai, . . . , A/c so that these q — t and t of the A's from A^+i, . . . , A„ define 
a reparameterization of (3. For each selection we can calculate the Bayes factor, and 
in the spirit of IBF we can take a suitable average over all these Bayes factors. If 
the rank deficiency of is 1, wc will have k — r Bayes factors to average. 
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Appendix 

Proof of Theorem 4-1- From (4.3) and (4.5) it is immediate that Trf (/3) < '^^{(3). 
Thus it is enough to prove (4.7) for j = 0. Let i denote the indices (ii, . . . , iq) and 
A[i) denote a qx q submatrix of A based on rows ii, . . . ,iq. Then by Binet-Cauchy 
expansion of determinant (cf. Noble [!!)], p. 226) it can be shown that 

n 

(Al) I ^ \,a,af\ = ^(A., . . . X,J\A{i)A{if\, 

1=1 

where the summation is over aU submatrices of order q x q. Dropping the terms 
from the above summation for which |A(i)A(i)^| = we get from (4.3) that 



(A2) 7ro«(/3) < Y,{X.,,...\y/Mii)Mif\' 



/2 



where ^* denotes summation over all q x q matrices for which |A(i)A(i)"^| > 0. 
Since e'^'X^^'/xjl < 1, from (4.7) and (A2) we get 

(A3) m^ix)<j2 I i{ C }{K---Kf'M{^A{i)^\^/^dl3. 

Recall that log(Ai) = aoi + af f3. Now transforming f3 to (A^^ , . . . , A^^) and using the 
Jacobian of transformation (A^j . . . \i^)~^\A{i)A{i)'^\~^/^ , we get from (A3) that 

(A4) m^{x) < E n / 1 < °« 



I "■"'■3 ^ 



since each of the integrals in the right hand side of (A4) is finite. This completes 
the proof of Theorem 4.1. □ 

Proof of Theorem 4-2. First, as in (Al) and (A2), it can be shown that for some 
positive c (not depending on parameters) less than 1 

C^(A,...AOV2|^(i)^(i)T|l/2 

(A5) 

<7ro«(/3)<5:(A,...A.jV2|^(i)^(i)T|i/2_ 
In view of this inequality and (4.10), the marginal mi {x) is finite if and only if 
(A6) / n ^-^iX,^...X..J'/M{i)Aiin/'df3<oo 

for each i = {ii, . . . ,iq) for which |A(i)j4(2)^| > 0. 

Note that the sufficient condition stated in the theorem and the condition that 
rank of A is g imply that the regression matrix A^ = (afc_|_i, . . . , a„) corresponding 
to the set of positive counts has rank q. 

Suppose, with no loss of generality, ii <■■■< iq in (A6). Also, suppose ii < 
■ ■ ■ < iu < k < iu+i < • ■ ■ < iq. It is possible that u may be or may be q. 



Objective Bayes testing of Poisson versus inflated Poisson models 



119 



By the assumed condition that for j = 1, . . . , /c, aj can be expressed as a Unear 
combination of Ofe+i, . . . , a„ with nonnegative coefficients, it follows that 

n 

where Cmi- > and > 0. Then 



n^^-/ n ^ 

j — l m—k+l 



where bm = J2]=i ^mi, > and / > are free from parameters. 

Then the integrand (without \A{i)A{i)'^\^^^) in (A6) can be simplified as 



i=k+l 



n ^-^^(A..,....AO^/^ 



i=k+l 



- — — H n ^„,, 



j"=«+i ' 1=1 



Oil ■ 



where {ai, . . . , Q;„+„_fc_q} = {fc + 1, . . . , n} - . . . , iq}. 

Suppose {si, . . . , Sq] C {fc + 1, . . . , n} is such that {a^i , . . . , } is a linearly 
independent set (such a set exists since A+ is of rank q). Note that for ?/ > the 
function ,9(u) = e~"tt^ is maximized at u = y implying 

(A8) e-"u^ < e'^y^ for all u > 0. 

By (A8) we get from (A7) that 

(A9) n ^^(A.....A.u^/^<^(n^"'^^<'o, 

where Z? > is a constant independent of the parameters and dgj ~ Xs^ + \bs- + ^ 
if e {iu+i, • • and 4^. = i-^^. + if Sj e {ai, . . ■ ^a^+u-k-q}- 

The Jacobian of transformation from /3 to As^ , . . . , A^^ is H/ {Xs^ ■ ■ ■ Ag^ ) for some 
H > constant. Then since dsj > 1 for j = 1, . . . , g, by (A9) we have 



(AlO) / TT ^^(A,, . . . A,Ji/2rf^ < n / ^-^.'^ < 



oo. 



By (AlO) and (A6) we conclude that mf (a;) corresponding to 7rQ'(/3) is finite. To 
prove finiteness of (a;) corresponding to 7rf'(/3) note that by (4.10) 



< / ( n — ^)^f (/3)rf/3- 



Finiteness of the right hand quantity in the last display follows from a version of 
Theorem 4.1 corresponding to the prior ttq'{(3) by replacing n observations from 
the Poisson by n — fc observations from Poisson. This completes the proof. □ 
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